69 research outputs found

    BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale

    Full text link
    Capturing the semantics of related biological concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association prediction, and biomedical literature-based discovery. Here, we propose to leverage state-of-the-art text mining tools and machine learning models to learn the semantics via vector representations (aka. embeddings) of over 400,000 biological concepts mentioned in the entire PubMed abstracts. Our learned embeddings, namely BioConceptVec, can capture related concepts based on their surrounding contextual information in the literature, which is beyond exact term match or co-occurrence-based methods. BioConceptVec has been thoroughly evaluated in multiple bioinformatics tasks consisting of over 25 million instances from nine different biological datasets. The evaluation results demonstrate that BioConceptVec has better performance than existing methods in all tasks. Finally, BioConceptVec is made freely available to the research community and general public via https://github.com/ncbi-nlp/BioConceptVec.Comment: 33 pages, 6 figures, 7 tables, accepted by PLOS Computational Biolog

    Vitamin D and Exercise Are Major Determinants of Natural Killer Cell Activity, Which Is Age- and Gender-Specific

    Get PDF
    BackgroundThe coronavirus-19 disease (COVID-19) pandemic reminds us of the importance of immune function, even in immunologically normal individuals. Multiple lifestyle factors are known to influence the immune function.ObjectiveThe aim was to investigate the association between NK cell activity (NKA) and multiple factors including vitamin D, physical exercise, age, and gender.MethodsThis was a cross-sectional association study using health check-up and NKA data of 2,095 subjects collected from 2016 to 2018 in a health check-up center in the Republic of Korea. NKA was measured using the interferon-γ (IFN-γ) stimulation method. The association of NKA with 25-(OH)-vitamin D (25(OH)D) and other factors was investigated by multiple logistic regression analysis.ResultsThe average age of subjects was 48.8 ± 11.6 years (52.9% of subjects were female). Among 2,095 subjects, 1,427 had normal NKA (NKA ≥ 500 pg IFN-γ/mL), while 506 had low NKA (100 ≤ NKA < 500 pg/mL), and 162 subjects had very low NKA (NKA < 100 pg/mL). Compared to men with low 25(OH)D serum level (< 20 ng/mL), vitamin D replete men (30–39.9 ng/mL) had significantly lower risk of very low NKA (OR: 0.358; 95% CI: 0.138, 0.929; P = 0.035). In women, both low exercise (OR: 0.529; 95% CI: 0.299, 0.939; P = 0.030) and medium to high exercise (OR: 0.522; 95% CI: 0.277, 0.981; P = 0.043) decreased the risk compared to lack of physical exercise. Interestingly, in men and women older than 60 years, physical exercise significantly decreased the risk. Older-age was associated with increased risk of very low NKA in men, but not in women.ConclusionPhysical exercise and vitamin D were associated with NKA in a gender- and age-dependent manner. Age was a major risk factor of very low NKA in men but not in women

    Scaling up data curation using deep learning: An application to literature triage in genomic variation resources.

    Get PDF
    Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases

    Downregulation of <i>LOC441461</i> Promotes Cell Growth and Motility in Human Gastric Cancer

    No full text
    Gastric cancer is a common tumor, with a high mortality rate. The severity of gastric cancer is assessed by TNM staging. Long noncoding RNAs (lncRNAs) play a role in cancer treatment; investigating the clinical significance of novel biomarkers associated with TNM staging, such as lncRNAs, is important. In this study, we investigated the association between the expression of the lncRNA LOC441461 and gastric cancer stage. LOC441461 expression was lower in stage IV than in stages I, II, and III. The depletion of LOC441461 promoted cell proliferation, cell cycle progression, apoptosis, cell motility, and invasiveness. LOC441461 downregulation increased the epithelial-to-mesenchymal transition, as indicated by increased TRAIL signaling and decreased RUNX1 interactions. The interaction of the transcription factors RELA, IRF1, ESR1, AR, POU5F1, TRIM28, and GATA1 with LOC441461 affected the degree of the malignancy of gastric cancer by modulating gene transcription. The present study identified LOC441461 and seven transcription factors as potential biomarkers and therapeutic targets for the treatment of gastric cancer

    Overall system architecture.

    Get PDF
    <p>We implemented both the one-stage and the two-stage method. (a) Data generation part. (b) One-stage method. Five-class type classifier for the one-stage method. (c) Two-stage method. The DDI detection classifier distinguishes positive DDI instances from negative instances. The DDI type classifier receives the predicted positive instances from the detection classifier as a testing set.</p

    BOSS: context-enhanced search for biomedical objects

    No full text
    Abstract Background There exist many academic search solutions and most of them can be put on either ends of spectrum: general-purpose search and domain-specific "deep" search systems. The general-purpose search systems, such as PubMed, offer flexible query interface, but churn out a list of matching documents that users have to go through the results in order to find the answers to their queries. On the other hand, the "deep" search systems, such as PPI Finder and iHOP, return the precompiled results in a structured way. Their results, however, are often found only within some predefined contexts. In order to alleviate these problems, we introduce a new search engine, BOSS, Biomedical Object Search System. Methods Unlike the conventional search systems, BOSS indexes segments, rather than documents. A segment refers to a Maximal Coherent Semantic Unit (MCSU) such as phrase, clause or sentence that is semantically coherent in the given context (e.g., biomedical objects or their relations). For a user query, BOSS finds all matching segments, identifies the objects appearing in those segments, and aggregates the segments for each object. Finally, it returns the ranked list of the objects along with their matching segments. Results The working prototype of BOSS is available at http://boss.korea.ac.kr. The current version of BOSS has indexed abstracts of more than 20 million articles published during last 16 years from 1996 to 2011 across all science disciplines. Conclusion BOSS fills the gap between either ends of the spectrum by allowing users to pose context-free queries and by returning a structured set of results. Furthermore, BOSS exhibits the characteristic of good scalability, just as with conventional document search engines, because it is designed to use a standard document-indexing model with minimal modifications. Considering the features, BOSS notches up the technological level of traditional solutions for search on biomedical information.</p

    Comparison between our proposed model and existing models.

    No full text
    <p>Comparison between our proposed model and existing models.</p
    corecore